A Mining Order-Preserving SubMatrices from Probabilistic Matrices

نویسندگان

  • QIONG FANG
  • WILFRED NG
  • JIANLIN FENG
  • YULIANG LI
چکیده

The Order-Preserving SubMatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real applications, such as those involving biological data or sensor data. The prevalence of uncertain data in various applications, however, poses new challenges for OPSM mining, since data uncertainty must be incorporated into OPSM modeling and the algorithmic aspects. In this paper, we define new probabilistic matrix representations to model uncertain data with continuous distributions. A novel Probabilistic Order-Preserving SubMatrix (POPSM) model is formalized in order to capture similar local correlations in probabilistic matrices. The POPSM model adopts a new probabilistic support measure that evaluates the extent to which a row belongs to a POPSM pattern. Due to the intrinsic high computational complexity of the POPSM mining problem, we utilize the anti-monotonic property of the probabilistic support measure and propose an efficient Apriori-based mining framework called PROBAPRI to mine POPSM patterns. The framework consists of two mining methods, UNIAPRI and NORMAPRI, which are developed for mining POPSM patterns respectively from two representative types of probabilistic matrices, the UniDist matrix (assuming uniform data distributions) and the NormDist matrix (assuming normal data distributions). We show that the NORMAPRI method is practical enough for mining POPSM patterns from probabilistic matrices that model more general data distributions. We demonstrate the superiority of our approach by two applications. First, we use two biological datasets to illustrate that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes, and greatly promotes the discovery of patterns with high biological significance. Our result is significantly better than the counterpart OPSMRM (OPSM with Repeated Measurement) model which adopts a set-valued matrix representation to capture data uncertainty. Second, we run the experiments on an RFID trace dataset and show that our POPSM model is effective and efficient in capturing the common visiting subroutes among users.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tendency based Subspace Clustering on Gene Expression Data

Microarrays are one of the latest breakthroughs in experimental molecular biology. By monitoring expressions of different genes under different experiments, a large matrix representing the gene expression levels of varying experiments will be produced. To reveal patterns in such matrices, Ben-Dor et al. introduced a probabilistic model to discover the strictly order-preserving submatrix (OPSM) ...

متن کامل

A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences

Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long pa...

متن کامل

Eigenvectors of block circulant and alternating circulant matrices

The eigenvectors and eigenvalues of block circulant matrices had been found for real symmetric matrices with symmetric submatrices, and for block circulant matrices with circulant submatrices. The eigenvectors are now found for general block circulant matrices, including the Jordan Canonical Form for defective eigenvectors. That analysis is applied to Stephen J. Watson’s alternating circulant m...

متن کامل

Extending the Order Preserving Submatrix: New patterns in datasets

This paper concerns in finding local patterns in gene expression datasets. We present new order relation patterns, and develop algorithms which finds those pattern. Our algorithms are the first algorithms to find the exact results for those patterns, yet in most cases they outperforms existing heuristical algorithm. Finally we present an algorithm for the broader problem of frequent itemset min...

متن کامل

Efficient Removal Lemmas for Matrices

The authors and Fischer recently proved that any hereditary property of two-dimensional matrices (where the row and column order is not ignored) over a finite alphabet is testable with a constant number of queries, by establishing the following (ordered) matrix removal lemma: For any finite alphabet Σ, any hereditary property P of matrices over Σ, and any > 0, there exists fP( ) such that for a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013